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Abstract 

Chentsov studied Riemannian metrics on the set of probability mea- 
sures from the point of view of decision theory. He proved that up to a 
constant factor the Fisher information is the only metric which is monotone 
under stochastic transformation. The present paper deals with monotone 
metrics on the space of finite density matrices on the basis of motivation 
provided by quantum mechanics. A characterization of those metrics is 
given in terms of operator monotone functions. Several concrete metrics 
are constructed and analyzed, in particular, instead of uniqueness in the 
probabilistic case, there is a large class of monotone metrics, some of which 
appeared long time ago in the physics literature. Moreover a limiting pro- 
cedure to pure states is discussed. 
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1 Introduction 



The idea of statistical distance between two probability distributions goes back to 
Fisher who was interested in a quantity which shows how difficult it is to decide 

*Published in Geometry in Present Days Science, eds. O.E. Barndorff-Nielsen and E.B. 
Vendel Jensen, 21-34 (World Scientific, 1999), written version of the conference talk at Aarhus 
University in 1997. 
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between two probability measures by statistical sampling. He found that the 
spherical representation of the probability simplex is adequate. The probability 
distributions (pi,P2, • • • ,Pn) on n points form an (n — l)-dimensional simplex 
<S n _i, since Pi > and J2iPi — 1- If we introduce the parameters Z{ = 2y/pi, 
then J2i z f — 4 and the probability simplex is parametrized as a portion of the 
n-sphere. Let z(t) be a curve on the sphere. The square of the length of the 
tangent is 

(d t z,d tZ ) = E(^) 2 = Y,Pi(t)(d t io gPi (t)) 2 , (i) 

i i 

which is the Fisher information. The geodesic distance between two probability 
distributions Q and R can be computed along a great circle and it is 

n 

d(Q, R) = 2 arccos y/plri ■ 
i=i 

One observes that the geodesic distance is a simple transform of the Hellinger 
distance. Namely, 

d H (Q, R) = VE?=i(pP-r; /2 ) 2 = 2 sin (d(Q, R)/4) . 

In applications of mathematical statistics one often meets a family of distributions 
parametrized by a real number or more generally by 9 G R m . An example is the 
family N(fj,,a) of normal distributions with mean fi G R and variance a G R + . 
An n-tuple (£i, £2, • • • , £n) of random variables is called an unbiased estimator of 
the parameter 9 if E(£i) = 9^ for 1 < % < n. In statistical problems an unbiased 
estimator can be used to estimate the true value of the parameter 9 on the basis 
of a sample. The variance of the estimator is desired to be small in order to have 
an effective estimation. The classical Cramer-Rao inequality is related to that 
point. The m x m covariance matrix E(^j) — E(£i)E(£j) is always larger than 
the inverse of the Fisher information matrix. The latter is independent of the 
estimator (£i,£2, • • • ,£n) an d one desirable property of an unbiased estimator is 
closeness of the covariance matrix to the inverse Fisher information matrix. 

In quantum mechanics, the state space of an n level system is identified with 
the set of all n x n positive semidefinite complex matrices of trace 1, they are 
the so-called density matrices. Let M. n stand for the set of all positive definite 
density matrices. We can parametrize D = (D^) G M. n by the real numbers 
Re-Djj, \mDij (1 < % < j < n) and by the positive numbers Da (1 < % < n — 1). 
In this way M. n may be embedded into the Euclidean /c-space with k = n 2 — 1 
and becomes a manifold. At each point D G Ai n the tangent space To{M-n) is 
identified with the set of all traceless selfadjoint matrices. One observes that the 
probability simplex is embedded into A4 n , since every probability distribution on 
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the n-point space gives a diagonal density matrix in the obvious way: 

Sn-l 3 (Pl,P2, • • • ,Pn) ^ Diag(pi,p 2 , • • • ,Pn) G -Mn- 

The aim of the present paper is a search for possible Riemannian metrics 
on the space of density matrices of a finite dimensional space. Without some 
restrictions this would be pointless, the emphasis is put on statisticaly relevant 
metrics which on the submanifold of probability distributions recover the Fisher 
information metric. 



2 Chentsov's approach to the problem 

Chentsov was led by decision theory when he considered a category whose objects 
are probability spaces and whose morphisms are Markov kernels. Although he 
worked in [3] with arbitrary probability spaces, his idea can be demonstrated very 
well on finite ones. In this Markov kernel from the probability (n — 1)- 

simplex <S„_i to an (m — l)-simplex <S m _i is an m x n stochastic matrix. If 
II is such a matrix and P E S n then IIP G S m is considered more random 
than P. If we want to represent probability distributions as column vectors 
then the matrix II has to be column-stochastic, that is, J2i^-ij — 1 for every j. 
An example of randomization comes from identification of two outcomes of our 
random experiment. This is described by a 0-1 matrix with one 1 in each row 
except for one where two l's stand. In statistical physical literature the term 
coarse graining is more often used than randomization but they stand for the 
same concept. 

Generally speaking, the parametrized family (Qi) is more random than the 
parametrized family (Pi) (with the same parameter set) if there exists a stochastic 
matrix II such that IIP, = Qi for every value of the parameter %. Two parametric 
families (Pj) and (Qi) are equivalent in the theory of statistical inference if there 
are two stochastic matrices IT^ 12 ) and II^ 21 ) such that 

u {12) p 1 = q 1 and u^Q i = p i (2) 

for every i. Chentsov defined a numerical function / given on pairs of measures 
to be invariant if 

(Pi,P 2 )~(Q 1 ,Q 2 ) implies f(P u P 2 ) = f(Q u Q 2 ) (3) 

and monotone if 

/(p 1 ,p 2 )>/(np 1 ,np 2 ). (4) 
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for every stochastic matrix II. A monotone function / is obviously invariant. 
Statistics and information theory know a lot of monotone functions, relative 
entropy 

S(P,Q) = 5>(logPi-logfc) (5) 

i 

and its generalizations. If a Riemannian metric is given on all probability sim- 
plexes, then this family of metrics is called invariant (respectively, monotone) 
if the corresponding geodesic distance is an invariant (respectively, monotone) 
function. Chentsov's greate achievement was to show that up to a constant fac- 
tor the Fisher information (1) yields the only monotone family of Riemannian 
metrics on the class of finite probability simplexes ([3]). 

A decade later Chentsov turned to the quantum case, where the probability 
simplex is replaced by the set of density matrices. A linear mapping between two 
matrix spaces sends a density matrix into a density if the mapping preserves trace 
and positivity (i.e., positive semidefinitness). By now it is well-understood that 
completely positivity is a natural and important requirement in the quantum case. 
Therefore, we call a trace preserving completely positive mapping stochastic. One 
of the equivalent forms of the completely positivity of a map T is the following. 

n n 
i=l j=l 

for all possible choice of a iy bi and n. A completely positive mapping T satisfies 
the Schwarz inequality: T(a*a) > T(a)*T(a). 

Chentsov recognized that stochastic mappings are the appropriate morphisms 
in the category of quantum state spaces. (The monograph [1] contains more 
information about stochastic mappings, see also [10].) The above definitions of 
invariance and monotonicity make sense when stochastic matrices are replaced by 
stochastic mappings. Chentsov (with Morozova) aimed to find the invariant (or 
monotone) Riemannian metrics in the quantum setting as well. They obtained the 
following result ([12]). Assume that a family of Riemannian metrics is given on 
all spaces of density matrices which is invariant, then there exist a function c(x, y) 
and a constant C such that the squared length of a tangent vector A = {A i3 ) at 
a diagonal point D = Diag(pi,p 2 , • • • ,Pn) is of the form 

ctPk 1 Al k + 2j2c(p 3 , Pk )\A jk \ 2 . (6) 

k=l j<k 

Furthermore, the function c(x,y) is symmetric and c(Xx, \y) = A _1 c(a;,y). This 
result of Morozova and Chentsov was not complete. Although they had proposals 
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for the function c(x,y), they did not prove monotonicity or invariance of any of 
the corresponding metrics. A complete result was obtained in [14] and [15] but 
before presenting it here we make a few comments on (6). 

Both the function c(x, y) and the constant C are independent of the matrix 
size n. Restricting ourselves to diagonal matrices, which is in some sense a step 
back to the probability simplex, we can see that there is no ambiguity of the 
metric. Loosely speaking, the uniqueness result of the simplex case survives along 
the diagonal and the offdiagonal provides new possibilities for the definition of a 
stochastically invariant metric on the space M. of invertible density matrices. In 
other words, the tangent space Td(M) at D decomposes as 

T D {M) = T D (M) C © T D (M)° , (7) 

where T D (Mf = {A e T D (M) : [A, D] = 0} and T D (M)° is the orthogonal 
complement of T D (M) C with respect to the Hilbert-Schmidt inner product of 
matrices. The monotone metric is unique on Td(.M) c , 

K d {A,A) = CTtD- 1 A 2 if AeT D (M) c (8) 

and the function c(x,y) determines the metric on the orthogonal complement. 

If a distance between density matrices expresses statistical distinguishabil- 
ity then this distance must decrease under coarse-graining. A good example of 
coarse-graining arises when a density matrix is partitioned in the form of a 2 x 2 
block matrix, and the coarse-graining forgets about the offdiagonal: 



( A 




-( A 






o) > 


\0 


°o) 



In the mathematical formulation, a coarse-graining is a completely positive map- 
ping which preserves the trace and hence sends density matrix into density ma- 
trix. Such mapping will be called stochastic below. A Riemannian metric is 
defined to be monotone if the differential of any stochastic mapping is a contrac- 
tion (in the sence that it is norm decreasing). If the afline parametrization is 
considered, then D t = D + tA is a curve for an invertible density D and for a 
selfadjoint traceless A. Under a stochastic mapping T this curve is transformed 
into T(D t ) = T(D) + tT(A) provided that T(D) is an invertible density and the 
real number t is small enough. The monotonicity condition for the Riemannian 
metric g on A4 n reads as 

g T[D) (T(A),T(A))<g D (A,A), (9) 

for any invertible density D, for any traceless selfadjoint matrix A and for any 
stochastic mapping T. Our goal is to show many examples of monotone metrics 
and to give their characterization in terms of operator monotone functions. 
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3 Monotone metrics 



Let us recall that a function / : R + — > R is called operator monotone if the 
relation < K < H implies < f(K) < f(H) for any matrices K and H (of any 
order). The theory of operator monotone functions was established in the 1930's 
by Lowner and there are several reviews on the subject, for example [2], [5] are 
suggested. 

The following result was obtained in [15]. 

Theorem 3.1. There exists a one-to-one correspondence between monotone met- 
rics and operator monotone functions f : R + — > R + such that f(t) = tf(t~ r ). If 
D = Diag(pi,p2, • • • ,Pn), then the metric corresponding to f is of the form 

n n 

J212 c (PvPk)\ A jk\ 2 , (io) 

3=1 fe=l 

where c(x,y) = l/yf(x/y). 



The proof of this result is given in the original paper. Here we remark 
that the metric (6) can be written by means of a certain function / such that 
c(x,y) = l/yf(x/y) holds. The point is to demonstrate, on the one hand that 
this function / must be operator monotone and, on the other hand that every op- 
erator monotone function provides a monotone metric. The symmetry condition 
f{t) = tf{t~ r ) is equivalent to the condition that the Riemannian inner product 
is real valued on the selfadjoint tangent vectors. It seems natural to normalize 
metrics such a way that on the submanifold of diagonal matrices the standard 
Fisher metric should appear. In this case one can say following Uhlmann that 
the metric is Fisher adjusted. This normalization is equivalent to the condition 
/(l) = 1. Below we always assume that f(l) = 1, that is, we restrict our dis- 
cussion to Fisher adjusted metrics. Some examples of functions / satisfying the 
hypothesis of Theorem 3.1 are the following. 

2x a+1 / 2 x-l x-1 2^/x ( x ~ l \ 2 2 l + x (ii) 
1 + x 2a ' log x ' log x 1 + x ' ^ log x ' 1 + x ' 2 

where < a < 1/2. 

It is worthwhile to note that Kubo and Ando established a correspondence 
between operator monotone functions and means of positive operators. Our con- 
dition f{t) = tfit^ 1 ) on the operator monotone function / is equivalent to the 
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symmetry of the corresponding operator mean. The smallest mean is the har- 
monic one. This corresponds to the function f(t) = 2t/(t + 1) and gives the 
metric 

gl h (A,B) = \TiD-\AB + BA). (12) 
Since a larger function / yields a smaller metric, we have 

Theorem 3.2. The Riemannian metric (12) is monotone and it is the largest 
among all Fisher- adjusted monotone metrics. 



One can see monotonicity of (12) directly. The operator inequality 

T(K)T(D)- l T(K)* < T(KD- l K*) , (13) 

holds for positive invertible D for every stochastic mapping [4], [11]. Taking the 
trace of both sides of (13), we conclude monotonicity. 

The arithmetic operator mean is the largest symmetric mean and it gives the 
smallest metric which is usually called the metric of the symmetric logarithmic 
derivative. 

Theorem 3.3. Among all Fisher- adjusted monotone metrics the smallest one is 
given as 

gf(A,B)=TrAG, (14) 
where G is the unique solution of the equation 

DG + GD = 2B. (15) 



The metrics g and g appeared in connection with generalizations of the 
Cramer- Rao inequality and g SL play important role in the work of Uhlmann when 
he extends Berry phase to mixed states from the pure ones. Is is rather instructive 
to have a look at the simple 2x2 case. 

Dealing with 2x2 density matrices, we conveniently use the so-called Stokes 
parametrization. 

D x = \(I + xiai + x 2 o 2 + x 3 a 3 ) = \{I + x ■ a) (16) 

where o~i,a 2 , cr 3 are the Pauli matrices and (x 1 , x 2 ,x 3 ) G R 3 with x\ + x\ + x\ < 1. 
The monotone metrics on Ai 2 are rotation invariant in the sense that they depend 
only on r = \Jx 2 + y 2 + z 2 and split into radial and tangential components as 
follows. 

X IX X 

ds * = T^ dr ' 2 + TT~AiT^ dn2 where 9{t) = W)' (17) 
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The radial component is independent of the function /. In case of f(t) = (t+ 1)/2 
we have constant tangential component. In the case of f(t) = 2t/(l + t), ds 2 = 
(1 — r 2 ) _1 ((ir 2 + dn 2 ). Hence both the smallest and the largest metrics possess a 
rather particular form. 

The limit of the tangential component exists when r — > 1 if /(0) ^ 0. In this 
way the standard metric is obtained on the set of pure states, up to a constant 
factor. In case of larger density matrices, pure states form a small part of the 
topological boundary of the invertible density matrices. Hence, in order to speak 
about the extension of a Riemannian metric on invertible densities to pure states, 
a rigorous meaning of the extension should be given. This is the subject of the 
paper [16] and will be discussed in the next section. 

It is remarkable that quantum statistical mechanics seems to prefer another 
metric, different from the smallest and from the largest one. This is termed the 
Kubo-Mori, or Boguliubov metric and sometimes canonical correlation. In the 
above used affme parametrization of the state space the Kubo-Mori metric takes 
the form _ 

/•oo 

g% M {A, B)= Tr (D + t)~ l A(D + t)~ l B dt . 
Jo 

In order to see that this is the usual Kubo-Mori inner product, we rewrite it 
in the logarithmic coordinate system instead of the affine one. In terms of the 
inverse Kubo transforms 

roo 

A' = / (D + s^AiD + s^ds, (18) 
Jo 

B' = [°°(D + s)~ 1 B(D + s'y 1 ds (19) 
Jo 



we have 



(A,B) = f 1 D'A'D^B'dt. (20) 
Jo 

Theorem 3.4. Assume that a Fisher adjusted monotone metric g is obtained 
from a smooth function G : R + — > R by 

g(A, B) (D) = ^- t=s=Q Tr G(D + tA + sB) . 



dtds 



Then g(A, B) is the Kubo-Mori inner product. 



Proof. When A, B and D commute, we have 
d 



dtds 



t=s=0 



Tr G(D + tA + sB) = Tr G" \D)AB . 



Since we assumed that the metric is Fisher-adjusted, G"(t) = t 1 and we have 
G(t) = tlogt + Ct + D and the differentiation gives the Kubo-Mori metric. □ 



The above proof also gives that the Kubo-Mori metric is the negative Hessian 
of the von Neumann entropy functional on the state space. Recall that the von 
Neumann entropy is the Boltzmann-Shannon entropy of the eigenvalues, that is, 

S(D) := — Tr (D logD) . 

Differentiation of entropy-like functional is a good method to obtain monotone 
metrics. In one variable Theorem 3.4 doest not allow many possibilities but in 
the two variable case one can get more metrics. A typical two-variable-entropy 
is the relative entropy Tr (/^(log-Di — log/^)) which is a member of the family 
of a-entropies. If — 2 < a < 2, then 

S Q (£>!, D 2 ) = -^Tr (/ - D^D~ 1 ^)D 1 (21) 
1 — cr 

is jointly convex. The metric 
d 2 



dtdu 



S a (D + tA,D + uB) t _ _ n = K£(A, B) (22) 



t=u=0 



was studied first by Hasegawa [6], [7] and its monotonicity was proved in [9] 
and [8]. Note that the limit a — > ±1 in the formulas recovers the usual relative 
entropy and the Kubo-Mori metric. Since (22) is a monotone metric, it is really 
interesting on tangent vectors orthogonal to the commutator of D: 

KZii[D,X],i[D,X]) = T ^- 2 Tr([D^,X][D^,X]), (23) 

where X is selfadjoint. It is worthwile to point out the similarity to the skew 
information proposed by Wigner, Yanase and Dyson (apart from a constant fac- 
tor), see [17] or p. 49 in [13]. The operator monotone functions corresponding to 
(22) are 

where j3 = (1 - a)/ 2. 

The following characterization of the a-metrics was obtained in [8]. 

Theorem 3.5. In the class of symmetric monotone metrics, the Wigner-Yanase- 
Dyson skew information (i.e. the a-metric (22)) is characterized by the property 
that 

K P (A, B) = q^TMp + tA)g*(p + sB)\ m , A = i[p, X], B = i[p, Y\. 
for some smooth functions g and g* . 
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To prove this theorem we compute the Morozova-Chentsov function for the 
metric determined by g and g* and we get 

r(x , v _ (g(\)-g(n))(g'(\)-g'(n)) 
ClA,/ij " (A-/i) 2 

From the property c(t\,t/i) = £ _1 c(A, //) we deduce that, under the condition 
0(O)0*(O) = 0, g(t\)g*(tX) = tg(X)g*(X) must hold. This implies that 

g(x)g*(x) = cx (x e R + ). 

Another necessary condition comes from the property that lim^-^ c(A, //) = /x -1 . 
In this way, we arrive at the condition 

g'(x)g*'(x) = x- 1 (x > 0) 

and the equations (24) and (25) together have the solution g(x) = ax p and 
g*(x) = bx l ~ p , ab = c = l/p(l —p), and the possible limit lim p ^ 0iOrl allowing x 
and \ogx. 



4 Radial extension to pure states 



The idea behind the radial extension comes from the 2x2 case when the Stokes 
parametrization given by (16) identifies M.2 with the open unit ball in R 3 and the 
pure states form the unit sphere. Let us fix a point P in the unit sphere (i.e. P is 
a pure state) and a tangent vector A at P. Moreover, let D be an element of the 
open unit ball except the origin such that P and D lie on the same radial line r. 
P can be thought as the radial projection of D to the boundary of the unit ball. 
Define a tangent vector A at D such that A is orthogonal to R and the endpoints 
of A and A lie on the same radial line. A can be thought as a lift of A with 
respect to the radial projection. Differential geometers call such lifted vectors 
'horizontal vectors' and vectors tangent to the radius at D are called 'vertical 
vectors'. Now one can take the inner product <?d(A B) of two lifts A, B of A, B 
at D with respect to a monotone Riemannian metric g and ask for conditions of 
the existance of the limit of <?d(A B) whenever D goes to P on the radius R. 

In the general case the radial projection is defined on an open and dense 
subset M.' n of M. n where A4' n is formed by the non-degenerate elements of A4 n , 
i.e. matrices whose eigenvalues are all distinct. Now the radial projection ir is a 
smooth mapping from A4' n into the pure states V such that n(D) is the projection 
to the one-dimensional eigenspace corresponding to the largest eigenvalue of D. 
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The idea of this projection is that if D is "near" to a pure state then the largest 
eigenvalue of D is near to 1 and the corresponding eigenspace is one dimensional. 

It can be proved that M.' n is a fibre bundle over V with projection n and in 
the 2x2 case the fibers are exactly the radiuses. If tt^d denotes the tangent map 
of 7i at D then the vertical space is Ker^*^ and the horizontal space Hp is the 
orthognal complement of Ker TT* t D with respect to a fixed monotone Riemannian 
metric g. Since tt^d is surjective, the restriction of tt^d to the horizontal space 
gives a linear isomorphism between Hp and the tangent space of V at n(D) thus 
for any tangent vector A at n(D) there exist a unique lift A at D such that 
n* >D (A) = A. 

If D = Diag(Ai, . . . , A n ) where Ai is the largest eigenvalue then the vertical 
vectors at D are identified with vectors of the following form 

fxn 

X 22 

V x n2 

and the horizontal vectors have the form 

/ u 2 
u 2 

\u n 



\ 

x 2n 




/ 



(24) 



The tangent vectors at the pure state tt(D) = Diag(l, 0, . . . , 0) also have the 
same form and the lift of a tangent vector is given by 

/ (Ai - \ 2 )u 2 ... (Ai - \ n )u., 

{\i-\ 2 )u 2 ... 

\{Xi-Xn)u n ... 

which is independent of the choise of g. Now the precise definition of the radial 
extension is the following 

Definition. We say that a smooth metric k on V is the radial extension of g 
if for every P G V, for every pair of tangent vectors A, B at P and for every 
sequence D m such that n(D m ) = P 

\hng D JA,B) = k P (A,B). 



(25) 
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Using (25) one can compute g D (A,B): 



9d(A,B) = 2Re ^ ^ M V 

i=2 / lV A lJ A l 

where / is the operator monotone function corresponding to the metric and Ui, Vi 
for % = 2, . . . ,n are the matrix elements of horizontal vectors A, B as in (24). 
Now from this expression it can be easily obtained the following 

Theorem 4.1. Let g be a monotone Riemannian metric on M. n and let /: R + — > 

R + be the corresponding operator monotone function. The radial extension k of 
g exists if and only if f(Q) ^ 0. In this case k = h/ /(0) where h is the canonical 
Riemannian metric on V , the so called Fubini- Study metric. 
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